Query Translation and Expansion for Searching Normal and OCR-Degraded Arabic Text

نویسندگان

  • Tarek Elghazaly
  • Aly Fahmy
چکیده

This paper provides a novel model for English/Arabic Query Translation to search Arabic text, and then expands the Arabic query to handle Arabic OCR-Degraded Text. This includes detection and translation of word collocations, translating single words, transliterating names, and disambiguating translation and transliteration through different approaches. It also expands the query with the expected OCR-Errors that are generated from the Arabic OCRErrors simulation model which proposed inside the paper. The query translation and expansion model has been supported by different libraries proposed in the paper like a Word Collocations Dictionary, Single Words Dictionaries, a Modern Arabic corpus, and other tools. The model gives high accuracy in translating the Queries from English to Arabic solving the translation and transliteration ambiguities and with orthographic query expansion; it gives high degree of accuracy in handling OCR errors.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

English/Arabic Cross Language Information Retrieval (CLIR) for Arabic OCR-Degraded Text

In this paper, a novel for Query Translation and Expansion for enabling English/Arabic CLIR for both normal and OCR-Degraded Arabic Text model has been proposed, implemented, and tested. First, an English/Arabic Word Collocations Dictionary has been established plus reproducing three English/Arabic Single Words Dictionaries. Second, a modern Arabic Corpus has been built. Third, a model for simu...

متن کامل

Novel Arabic OCR Degraded Text Retrieval Model

This paper provides a novel model enhances the Arabic OCR degraded text retrieval effectiveness. The model simulates the Arabic OCR recognition mistakes happens while the recognition process based on word based approach. Then using the expected OCR errors the model expands the user search query. The resulting expanded search query produced higher precision and recall in searching Arabic OCRDegr...

متن کامل

An Enhanced Arabic OCR Degraded Text Retrieval Model

This paper provides a new model enhancing the Arabic OCR degraded text retrieval effectiveness. The proposed model based on simulating the Arabic OCR recognition mistakes on a word based approach. Then the model expands the user search query using the expected OCR errors. The resulting expanded search query gives higher precision and recall in searching Arabic OCR-Degraded text rather than the ...

متن کامل

Probabilistic Retrieval of OCR Degraded Text Using N-Grams

Abst rac t . The retrieval of OCR degraded text using n-gram formulations within a probabilistic retrieval system is examined in this paper. Direct retrieval of documents using mgram databases of 2 and 3-grams or 2, 3, 4 and 5-grams resulted in improved retrieval performance over standard (word based) queries on the same data when a level of 10 percent degradation or worse was achieved. A secon...

متن کامل

Towards Supporting Exploratory Search over the Arabic Web Content: The Case of ArabXplore

Due to the huge amount of data published on the Web, the Web search process has become more difficult, and it is sometimes hard to get the expected results, especially when the users are less certain about their information needs. Several efforts have been proposed to support exploratory search on the web by using query expansion, faceted search, or supplementary information extracted from exte...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009